skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fan, Juanjuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Individuals may respond to treatments with significant heterogeneity. To optimize the treatment effect, it is necessary to recommend treatments based on individual characteristics. Existing methods in the literature for learning individualized treatment regimes are usually designed for randomized studies with binary treatments. In this study, we propose an algorithm to extend random forest of interaction trees (Su et al., 2009) to accommodate multiple treatments. By integrating the generalized propensity score into the interaction tree growing process, the proposed method can handle both randomized and observational study data with multiple treatments. The performance of the proposed method, relative to existing approaches in the literature, is evaluated through simulation studies. The proposed method is applied to an assessment of multiple voluntary educational programmes at a large public university. 
    more » « less
  2. With the collection and availability of data on student academic performance and academic background, higher education institutions have recently stepped up initiatives in and infrastructure for learning analytics, leveraging this deluge of data to inform student success. With definitions of student success varying from analyses of what predicts levels of specific career readiness competencies to degree completion, the environment is a fertile ground for statistical practice and collaboration among a statistically savvy yet diverse clientele of instructors, programme advisors and administrators. In this paper, we discuss our experiences to this end through a consulting project evaluating the impact of writing course class size on students achieving a graduation writing requirement. In detailing the workflow for and challenges in this project, we share aspects of statistical communication and reporting, applications of innovative statistical methodology developed by our research group for handling confounding factors and correlated inputs and training through an interdisciplinary applied institutional research professional development programme. This paper illustrates how instilling an appreciation for statistical inference through each of these components is invaluable for capturing institutional buy‐in for data‐informed decision‐making in general statistical practice. 
    more » « less
  3. Abstract Nocturnal hypoglycemia is a common phenomenon among patients with diabetes and can lead to a broad range of adverse events and complications. Identifying factors associated with hypoglycemia can improve glucose control and patient care. We propose a repeated measures random forest (RMRF) algorithm that can handle nonlinear relationships and interactions and the correlated responses from patients evaluated over several nights. Simulation results show that our proposed algorithm captures the informative variable more often than naïvely assuming independence. RMRF also outperforms standard random forest and extremely randomized trees algorithms. We demonstrate scenarios where RMRF attains greater prediction accuracy than generalized linear models. We apply the RMRF algorithm to analyze a diabetes study with 2524 nights from 127 patients with type 1 diabetes. We find that nocturnal hypoglycemia is associated with HbA1c, bedtime blood glucose (BG), insulin on board, time system activated, exercise intensity, and daytime hypoglycemia. The RMRF can accurately classify nights at high risk of nocturnal hypoglycemia. 
    more » « less
  4. Observational studies require matching across groups over multiple confounding variables. Across the literature, matching algorithms fail to handle the issue of missing data. Consequently, missing values are regularly imputed prior to being considered in the matching process. However, imputing is not always practical, forcing us to drop an observation due to the deficiency of the chosen algorithm, decreasing the power of the study and possibly failing to capture crucial latent information. We propose a missing data mechanism to incorporate within an iterative multivariate matching method. The underlying framework utilizes random forest as a natural tool in constructing a distance matrix, implemented with surrogate splits where there might be missing values. The output is then easily fed into an optimal matching algorithm. We apply this method to evaluate the effectiveness of supplemental instruction (SI) sessions, a voluntary program where students seek additional help, in a large enrollment, bottleneck introductory business statistics course. This is an observational study with two groups, those who attend multiple SI sessions and those who do not, and, as typical in educational data mining, challenged by missing data. Additionally, we perform a data simulation on missingness to further demonstrate the efficacy of our proposed approach. 
    more » « less